What Is Claimed Is: 

A method for selecting a node to host a primary server for a service 
from & plurality of nodes in a distr/buted computing system, the method 

3 comprising: 

4 receiving an indication/hat a state of the distributed computing system has 

5 changed; 

6 in response to the indication, determining if there is already a node hosting 

7 the primary server for the service; and 

8 if there is not already a node hosting the primary server, selecting a node to 

9 host the primary server b/sed upon rank information for the nodes. 



1 2. The method of claim 1 , wherein selecting the node to host the 

2 primary server invokes: 

3 assuming mat a given node from the plurality of nodes in the distributed 

4 computing system hosts the primary server, 

5 commumcating rank information between the given node and other nodes 

6 in the distributed computing system, wherein each node in the distributed 

7 computing system has a unique rank with respect to the other nodes in the 

8 distributed/omputing system, 

9 comparing a rank of the given node with a rank of the other nodes in the 

10 distribifted computing system, and 

11 /if one of the other nodes in the distributed computing system has a higher 

12 ranjf than the given node, disqualifying the given node from hosting the primary 

1 3 setfver. 
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1 3. The method of cliim 2, further comprising, if there exists a node 

2 that is configured to host the primary server, allowing the node that is configured 

3 to host the primary server to communicate with other nodes in the distributed 

4 computing system in order to disqualify the other nodes from hosting the primary 

5 server. 



1 4. The method o f claim 2, wherein assuming that the given node 

2 hosts the primary server invc Ives: 

3 maintaining a candidate variable in the given node identifying a candidate 

4 node to host the primary sewer; and 

5 initially setting thocandidate variable to identify the given node. 



1 5. The metMod of claim 1 , further comprising, after a new node has 

2 been selected to host tne primary server, if the new node is different from a 

3 previous node that hosted the primary server, establishing connections for the 

4 service to the new none. 



1 
2 
3 
4 



6. Thef method of claim 1 , further comprising, after a new node has 
been selected to host the primary server, if the new node is different from a 
previous node that hosted the primary server, configuring the new node to host the 
primary server for the service. 



1 7. / The method of claim 1, further comprising restarting the service if 

2 the service /vas interrupted as a result of the change in state of the distributed 

3 computing system . 
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8. The method of claim 2, wherein the given node in the distributed 
computing system acts a one of: 

a host for the pmmary server for the service; 

a host for a secondary server for the service, wherein the secondary server 
periodically receives checkpointing information from the primary server; and 

a spare for the primary server, wherein the spare does not receive 
checkpointing information from the primary server. 



9. The 
the service, selecting 
service. 



mejthod of claim 8, further comprising, upon initial startup of 
highest ranking spare to host the primary server for the 



10. The method of claim 8, further comprising allowing the primary 
server to configure st ares in the distributed computing system to host secondary 
servers for the service. 

1 1 . The r lethod of claim 8, wherein comparing the rank of the given 
node with the rank off the other nodes in the distributed computing system 
involves considering a host for the primary server to have a higher rank than a 
host for a space, an<| considering a host for a secondary server to have a higher 
rank than a spare. 



1 12. The method of claim 2, wherein disqualifying the given node from 

2 hosting the primary server involves ceasing to communicate rank information 

3 between the given node and the other nodes in the distributed computing system. 
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1 1 y. A computer-readable storage medium storing instructions that 

2 when executed by a computen cause the computer to perform a method for 

3 selecting a node to host a primary server for a service from a plurality of nodes in 

4 a distributed computing syspm, the method comprising: 

5 receiving an indication that a state of the distributed computing system has 

6 changed; 

7 in response to the i idication, determining if there is already a node hosting 

8 the primary server for the service; and 

9 if there is not alreac y a node hosting the primary server, selecting a node to 
1 0 host the primary server based upon rank information for the nodes. 



1 14. The computer-readable storage medium of claim 13, wherein 

2 selecting the node to host/the primary server involves: 

3 assuming that a given node from the plurality of nodes in the distributed 

4 computing system hosts /the primary server, 

5 communicating rank information between the given node and other nodes 

6 in the distributed computing system, wherein each node in the distributed 

7 computing system has /a unique rank with respect to the other nodes in the 

8 distributed computing system, 

9 comparing a rank of the given node with a rank of the other nodes in the 

10 distributed computing system, and 

1 1 if one of the other nodes in the distributed computing system has a higher 

12 rank than the given pode, disqualifying the given node from hosting the primary 

13 server. 

1 15. The computer-readable storage medium of claim 14, wherein if 

2 there exists a node that is configured to host the primary server, the method 
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further comprises allowing tm node that is configured to host the primary server 
to communicate with other nAdes in the distributed computing system in order to 
disqualify the other nodes fxim hosting the primary server. 

16. The compute r-readable storage medium of claim 14, wherein 
assuming that the given noc e hosts the primary server involves: 

maintaining a candidate variable in the given node identifying a candidate 
node to host the primary server; and 

initially setting the candidate variable to identify the given node. 



17. The computerireadable storage medium of claim 13, wherein after 
a new node has been selected to host the primary server, if the new node is 
different from a previous node that hosted the primary server, the method further 
comprises establishing connections for the service to the new node. 



1 8. The computer-readable storage medium of claim 13, wherein after 
a new node has been selected to host the primary server, if the new node is 
different from a previous node that hosted the primary server, the method further 
comprises configuring the new node to host the primary server for the service. 

19. The comriuter-readable storage medium of claim 13, wherein the 
method further comprises restarting the service if the service was interrupted as a 
result of the change in state of the distributed computing system. 

20. The computer-readable storage medium of claim 14, wherein the 
given node in the distributed computing system acts a one of: 

a host for the primary server for the service; 

17 
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1 a host for a secondary/server for the service, wherein the secondary server 

2 periodically receives checkpointing information from the primary server; and 

3 a spare for the primary server, wherein the spare does not receive 

4 checkpointing information from the primary server. 

1 21 . The comrfuter-readable storage medium of claim 20, wherein upon 

2 initial startup of the senfice, the method further comprises selecting a highest 

3 ranking spare to host the primary server for the service. 

1 22. The computer-readable storage medium of claim 20, wherein the 

2 method further comprises allowing the primary server to configure spares in the 

3 distributed computing system to host secondary servers for the service. 

1 23. The computer-readable storage medium of claim 20, wherein 

2 comparing the ranWof the given node with the rank of the other nodes in the 

3 distributed computing system involves considering a host for the primary server to 

4 have a higher rank than a host for a space, and considering a host for a secondary 

/ 

5 server to have a higher rank than a spare. 

1 24. /The computer-readable storage medium of claim 14, wherein 

2 disqualifying /the given node from hosting the primary server involves ceasing to 

3 communicate rank information between the given node and the other nodes in the 

4 distributed computing system. 

1 /25 1 An apparatus that selects a node to host a primary server for a 

2 service f^om a plurality of nodes in a distributed computing system, the apparatus 

3 comprising: 

/ 
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a receiving mechanism that is configured to receive an indication that a 



state of the distributed 
a determinatic 



computing system has changed; 
n mechanism that is configured to determine if there is 
already a node hostinjg the primary server for the service in response to the 
indication; 

a selecting mechanism, wherein if there is not already a node hosting the 
primary server, the selecting mechanism is configured to select a node to host the 



primary server based 



upon rank information for the nodes. 



26. 



The apparatus of claim 25, wherein, in selecting a node to host the 
primary server based apon rank information, the selecting mechanism is 
configured to: J 

communicatj rank information between the given node and other nodes in 

the distributed computing system, wherein each node in the distributed computing 

I 

system has a unique rank with respect to the other nodes in the distributed 



computing system jand to 

compare a rank of the given node with a rank of the other nodes in the 

] 

distributed computing system. 



1 

2 
3 
4 



27. The apparatus of claim 26, further comprising a disqualification 
mechanism that isj configured to disqualify the given node from hosting the 
primary server if one of the other nodes in the distributed computing system has a 
higher rank than the given node. 



1 



28. The apparatus of claim 26, further comprising a mechanism on the 



2 primary server that is configured to communicate with other nodes in the 
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3 distributed computing system in order to disqualify the other nodes from hosting 

4 / 

1 29. The apparatus of claim 26, wherein the selecting mechanism is 

2 configured to: I 

3 maintain a candidate variable in the given node identifying a candidate 

4 node to host the prirrary server; and to 

5 initially set the candidate variable to identify the given node. 

1 30. The apparatus of claim 25, further comprising a connection 

2 mechanism that is configured to establish connections for the service to a new 

3 node after the new/node has been selected to host the primary server, and if the 

4 new node is different from a previous node that hosted the primary server. 

1 31. The apparatus of claim 25, further comprising a mechanism that 

2 configures a new node to host the primary server for the service, after the new 

I 

3 node has been selected to host the primary server, and if the new node is different 

1 32. J The apparatus of claim 25, further comprising a restarting 

2 mechanism that is configured to restart the service if the service was interrupted as 

3 a result of the] change in state of the distributed computing system. 

1 33. J The apparatus of claim 26, wherein the given node in the 

2 distributed computing system acts a one of: 

3 a host for the primary server for the service; 
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1 a host for a secondary se rver for the service, wherein the secondary server 

2 periodically receives checkpoin ing information from the primary server; and 

3 a spare for the primary Server, wherein the spare does not receive 



4 checkpointing information from the primary server. 

1 34. The apparatus of claim 33, further comprising an initialization 

2 mechanism wherein during initialization of the service, the initialization 

3 mechanism is configured to select a highest ranking spare to host the primary 

4 server for the service. J 

1 35. The apparatus of claim 33, further comprising a promotion 

2 mechanism on the primar^server that that is configured to promote spares in the 

3 distributed computing system to host secondary servers for the service. 

/ 

1 36. The apparatus of claim 33, wherein while comparing the rank of 

2 the given node with the/rank of the other nodes in the distributed computing 

3 system, the selecting miechanism is configured to consider a host for the primary 

/ 

4 server to have a highej rank than a host for a secondary server, and to consider a 

5 host for a secondary server to have a higher rank than a spare. 

1 37. The apparatus of claim 26, wherein the selecting mechanism is 

I 

2 configured to cease |o communicate rank information between the given node and 

3 the other nodes in the distributed computing system after the given node is 

4 disqualified by the disqualification mechanism. 

/ / 

1 3/8. A method for selecting a node to host a primary server for a service 

2 from a plurality o/ nodes in a distributed computer system, comprising: 
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3 communicating disqualification information between the node and 

4 remaining nodes in the plurality of nodes; 

5 disqualifying the node ft Dm hosting the primary server based upon the 

6 disqualification information received from the remaining nodes. 

I 

1 39. The method of ^laim 38, wherein the disqualification information 

2 comprises a node rank informition. 



1 40. The method of :laim 39, wherein the node rank for a given node is 

2 calculated using an assumptio l that the given node hosts the primary server. 

1 41 . The method ofjclaim 40, wherein the calculated node rank is 

2 unique with respect to the ranks of other nodes in the distributed computer system. 

I 

1 42. The method of claim 39, wherein the disqualifying of the node 

2 comprises: | 

3 comparing a rank of the node to a set of ranks of the remaining nodes in 



4 the distributed computer system; and 

5 disqualifying the node from hosting the primary server if one of the set of 



6 ranks of the remaining nodes is higher than the rank of the node. 



1 43. The method of claim 38, further comprising repeating the acts of 

f 

2 communicating disqualification information and disqualifying the node for at least 

3 one more node in the plurality of nodes. 
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